fetch_etf_data.py - ChartsMaze EDL Pipeline

Overview

This standalone script fetches comprehensive data for all NSE-listed ETFs (Exchange Traded Funds) from the ScanX API, including fundamental metrics, performance data, and ETF-specific fields like expense ratios.

Why Standalone?

Different Asset Class: ETFs have unique characteristics compared to stocks
ETF-Specific Metrics: Requires fields like ExpenseRatio and MfCoCode not applicable to stocks
Separate Universe: ETF tracking is independent of equity stock analysis
Infrequent Updates: ETF list changes less frequently than stock data needs refreshing
Manual Trigger: Should be run when you need to update the ETF universe or check new ETF launches

What It Fetches

The script retrieves comprehensive fundamental and technical data for all NSE ETFs:

Fundamental Fields

Market Cap, P/E Ratio, P/B Ratio
Dividend Yield
Revenue, Revenue Growth (1 Year)
Net Profit Margin, EBITDA Margin
EPS, ROE, ROCE

ETF-Specific Fields

Expense Ratio: Annual fund operating expense
MfCoCode: Mutual Fund Company Code
ISIN: International Securities Identification Number

Technical Fields

Last Traded Price (LTP)
Price changes (1 week, 1 month, 3 months, 1 year, 3 years, 5 years)
Moving Averages (SMA 50, SMA 200)
RSI (14-day)
52-week high and distance from it
Volume

Output Files

etf_data_response.json

JSON file

Contains array of ETF objects with all fetched fields. Saved in the current working directory.

Example Output Structure:

[
  {
    "Isin": "INF204KB14I2",
    "DispSym": "NIFTYBEES",
    "OgInst": "ETF",
    "Mcap": 25000,
    "Ltp": 215.50,
    "ExpenseRatio": 0.05,
    "MfCoCode": "123",
    "PricePerchng1year": 15.5,
    "DivYeild": 1.2,
    ...
  }
]

API Reference

`fetch_all_etf_data()`

Fetches all NSE ETFs from the ScanX API.

None

void

This function takes no parameters.

Returns: None (outputs to file) API Endpoint:

url = "https://ow-scanx-analytics.dhan.co/customscan/fetchdt"

Payload Configuration:

payload = {
    "data": {
        "sort": "Mcap",           # Sort by Market Cap
        "sorder": "desc",         # Descending order
        "count": 1000,            # Large count to get all ETFs
        "fields": [...],          # 32 fundamental + technical + ETF fields
        "params": [
            {"field": "OgInst", "op": "", "val": "ETF"},  # ETF instrument type
            {"field": "Exch", "op": "", "val": "NSE"}     # NSE exchange
        ],
        "pgno": 0
    }
}

Key Parameters:

OgInst: ETF - Filters for Exchange Traded Funds only
Exch: NSE - Restricts to NSE-listed ETFs
count: 1000 - High count to capture all ETFs in single request

Error Handling:

Uses raise_for_status() to catch HTTP errors
Validates response structure before saving
Catches and prints all exceptions

When to Run Manually

Initial Setup

Run once when setting up the pipeline to get the complete ETF universe.

New ETF Launches

Check when new ETFs are launched (track AMC announcements).

Quarterly Review

Run quarterly to catch any new ETF listings and update expense ratios.

Portfolio Rebalancing

Before running ETF analysis or portfolio allocation scripts.

Usage

python3 fetch_etf_data.py

Expected Output:

Fetching ETF data from https://ow-scanx-analytics.dhan.co/customscan/fetchdt...
Successfully fetched 150 ETFs. Saved to etf_data_response.json

Source Code

import requests
import json
import os

def fetch_all_etf_data():
    url = "https://ow-scanx-analytics.dhan.co/customscan/fetchdt"
    
    # Payload for ETFs as specified by the user
    payload = {
        "data": {
            "sort": "Mcap",
            "sorder": "desc",
            "count": 1000,  # Large count to get all ETFs in one go
            "fields": [
                "Isin", "OgInst", "DispSym", "Mcap", "Pe", "DivYeild", 
                "Revenue", "Year1RevenueGrowth",
                "NetProfitMargin", "YoYLastQtrlyProfitGrowth", 
                "EBIDTAMargin", "Volume", "PricePerchng1year",
                "PricePerchng3year", "PricePerchng5year", "Ind_Pe", 
                "Pb", "DivYeild", "Eps", "DaySMA50CurrentCandle",
                "DaySMA200CurrentCandle", "DayRSI14CurrentCandle", 
                "ROCE", "MfCoCode", "Ltp", "Roe",
                "RtAwayFrom5YearHigh", "High5yr", "Sym", 
                "PricePerchng1mon", "PricePerchng3mon", "ExpenseRatio",
                "PledgeBenefit", "Rmp"
            ],
            "params": [
                {"field": "OgInst", "op": "", "val": "ETF"},
                {"field": "Exch", "op": "", "val": "NSE"}
            ],
            "pgno": 0,
            "sorder": "desc",
            "sort": "Mcap"
        }
    }

    headers = {
        "Content-Type": "application/json",
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
        "Accept": "application/json, text/plain, */*",
        "Origin": "https://scanx.dhan.co",
        "Referer": "https://scanx.dhan.co/"
    }

    print(f"Fetching ETF data from {url}...")
    try:
        response = requests.post(url, json=payload, headers=headers)
        response.raise_for_status()
        
        data = response.json()
        
        if 'data' in data and isinstance(data['data'], list):
            cleaned_data = data['data']
            # Save the cleaned list to a JSON file
            output_file = "etf_data_response.json"
            with open(output_file, "w") as f:
                json.dump(cleaned_data, f, indent=4)
            print(f"Successfully fetched {len(cleaned_data)} ETFs. Saved to {output_file}")
        else:
            print("Response structure might be different than expected. Check raw response.")
            
    except Exception as e:
        print(f"Error fetching ETF data: {e}")

if __name__ == "__main__":
    fetch_all_etf_data()

Dependencies

requests: HTTP library for API calls
json: JSON serialization/deserialization
os: (imported but not used in current version)

This script uses hardcoded headers similar to fetch_fno_data.py. Consider refactoring to use get_headers() from pipeline_utils.py for better maintainability.

ETF-Specific Considerations

Expense Ratio

The ExpenseRatio field is critical for ETF evaluation. Lower expense ratios generally indicate better long-term returns for passive index ETFs.

Tracking Error

While not directly fetched, you can calculate tracking error by comparing ETF price movements against its underlying index (from fetch_all_indices.py).

Liquidity

The Volume field helps assess ETF liquidity - important for entry/exit execution quality.

Integration Possibilities

While standalone, this data can be used for:

ETF screeners and comparison tools
Passive investment strategy builders
Tax-efficient portfolio construction
Sector/thematic exposure analysis

​Overview

​Why Standalone?

​What It Fetches

​Fundamental Fields

​ETF-Specific Fields

​Technical Fields

​Output Files

​API Reference

​fetch_all_etf_data()

​When to Run Manually

​Usage

​Source Code

​Dependencies

​ETF-Specific Considerations

Expense Ratio

Tracking Error

Liquidity

​Integration Possibilities

Overview

Why Standalone?

What It Fetches

Fundamental Fields

ETF-Specific Fields

Technical Fields

Output Files

API Reference

`fetch_all_etf_data()`

When to Run Manually

Usage

Source Code

Dependencies

ETF-Specific Considerations

Integration Possibilities